What are survival data anyway?

Examples from cancer

  • Time from diagnosis to death
  • Time from surgery to recurrence of disease
  • Time from start of treatment to progression of disease
  • Time from response to recurrence of disease

Examples from other fields

  • Time from HIV infection to development of AIDS
  • Time to from diagnosis with heart disease to heart attack
  • Time from dicharge from rehabilitation facility to recurrence of substance abuse
  • Time from birth to initiation of sexual activity
  • Time from production to machine malfunction

A rose by any other name…

Because time-to-event data are common in many fields, it also goes by names besides survival analysis including:

  • Reliability analysis
  • Duration analysis
  • Event history analysis
  • Time-to-event analysis

What is censoring?

Censoring occurs when the event of interest is not observed after a period of follow-up

But isn’t this just binary data??

  • Binary data doesn’t have the ability to change depending on the time of analysis, e.g. 5-year survival will have the same value whether it is analyzed at 5 years and 1 day, 5 years and 2 days, 6 years, etc. Either a participant died by 5 years or they didn’t.

  • Time-to-event data may have different values depending on the time of analysis, e.g. overall survival will have different values depending on whether it is analyzed at 5 years and 1 day or at 6 years, since additional participants can die between those two time points.

Right censoring example

Reasons for censoring

A subject may be censored due to:

  • Loss to follow-up
  • Withdrawal from study
  • No event by end of fixed study period

Other types of censoring

  • Left censoring: when the event or censoring occurred before a study has started or data is collected
  • Interval censoring: when the event or censoring occurred between two dates but when is not known exactly

Today we will focus only on right censoring.

Recall this plot

Analysis must account for censored patients

How would we compute the proportion who are event-free at 18 years?

  • Subjects 1 and 9 had the event before 10 years
  • Subjects 2 and 6 were event-free at 10 years
  • Subjects 3, 4, and 5 had the event after 10 years
  • Subjects 7, 8, and 10 were censored before 10 years

Additional reasons for survival analysis

  • Distribution of follow-up times is skewed
  • Distribution may differ between censored and event patients
  • Follow-up times are always positive

(A very small amount of) mathematical notation

To analyze survival data, we need to know the observed time \(Y_i\) and the event indicator \(\delta_i\). For subject \(i\):

  • Observed time \(Y_i = \min(T_i, C_i)\) where \(T_i\) = event time and \(C_i\) = censoring time
  • Event indicator \(\delta_i\) = 1 if event observed (i.e. \(T_i \leq C_i\)), = 0 if censored (i.e. \(T_i > C_i\))

Survival is observed as a step function

The probability that a subject will survive beyond any given specified time

\[S(t) = Pr(T>t) = 1 - F(t)\]

\(S(t)\): survival function \(F(t) = Pr(T \leq t)\): cumulative distribution function

In theory the survival function is smooth; in practice we observe events on a discrete time scale.

Definitions

The survival probability at a certain time, \(S(t)\), is a conditional probability of surviving beyond that time, given that an individual has survived just prior to that time. The survival probability can be estimated as the number of patients who are alive without loss to follow-up at that time, divided by the number of patients who were alive just prior to that time.

The Kaplan-Meier estimate of survival probability at a given time is the product of these conditional probabilities up until that given time.

At time 0, the survival probability is 1, i.e. \(S(t_0) = 1\).

Example data

NEED TO ADD DATES TO THE EXAMPLE DATA NEXT

To access the example data used throughout this talk, install and load the {cancersimdata} package from my GitHub repo:

# If needed, install the remotes package first
# install.packages("remotes")
remotes::install_github(zabore/cancersimdata)
library(cancersimdata)

Connect with me

zabore2@ccf.org

https://www.emilyzabor.com/

https://github.com/zabore

https://www.linkedin.com/in/emily-zabor-59b902b7/

https://bsky.app/profile/zabore.bsky.social/

Further reading

Clark, T., Bradburn, M., Love, S., & Altman, D. (2003). Survival analysis part I: Basic concepts and first analyses. 232-238. ISSN 0007-0920.

M J Bradburn, T G Clark, S B Love, & D G Altman. (2003). Survival Analysis Part II: Multivariate data analysis – an introduction to concepts and methods. British Journal of Cancer, 89(3), 431-436.

Bradburn, M., Clark, T., Love, S., & Altman, D. (2003). Survival analysis Part III: Multivariate data analysis – choosing a model and assessing its adequacy and fit. 89(4), 605-11.

Clark, T., Bradburn, M., Love, S., & Altman, D. (2003). Survival analysis part IV: Further concepts and methods in survival analysis. 781-786. ISSN 0007-0920.

include-after: |